-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ncu report analyzer #2497
base: main
Are you sure you want to change the base?
Add ncu report analyzer #2497
Conversation
import ncu_report | ||
|
||
# save all kernels' metrics. {metric_name: [kernel1_metric_value, kernel2_metric_value, ...]} | ||
results = defaultdict(list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xuzhao9
Any suggestions on how we should save this data? We need to keep the metric results for each kernel, but we also need aggregated results, right? For example, the memory traffic (both read and write) for the whole operator should be the sum of all kernels' read and write traffic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2219090
to
3845308
Compare
memory_traffic_write = [item[1] for item in results["memory_traffic"]] | ||
results["memory_traffic_read_sum"] = sum(memory_traffic_read) | ||
results["memory_traffic_write_sum"] = sum(memory_traffic_write) | ||
results["weighted_fp32_arithmetic_intensity"] = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need a better way to manage those hidden metrics
This PR adds a ncu report analyzer to analyze the profiled ncu report. It also adds two metrics
memory_traffic
andarithmetic_intensity
. To avoid excessive profiling overhead, we only profile with necessary ncu metrics.This PR is a part of operator benchmarking plan
Example commands:
Example output: